Rule Induction for Sentence Reduction
نویسندگان
چکیده
The field of Automatic Sentence Reduction has been an active research topic, with several relevant approaches being recently proposed. However, in our view many milestones still need to be reached in order to approach human-like quality sentence simplification. In this work, we propose a new framework, which processes huge sets of web news stories and learns sentence reduction rules in a fully automated and unsupervised way. This is our main contribution. Our system is conceptually composed of several modules. In the first one, the system automatically extracts paraphrases from on-line news stories, using new lexically based functions that we have proposed. In our system's second module, the extracted paraphrases are transformed into aligned paraphrases, meaning that the two paraphrasic sentences get their words aligned through DNA-like sequence alignment algorithms, that has been conveniently adapted for aligning sequences of words. These alignments are then explored and specific text structures called bubbles are selected. Afterwards, these structures are transformed into learning instances and used in the last learning module that exploits techniques of Inductive Logic Programming. This module learns the rules for sentence reduction. Results show that this is a good approach for learning automatic sentence reduction, while some pertinent issues still need future investigation.
منابع مشابه
Hierarchical Maximum Pattern Matching with Rule Induction Approach for Sentence Parsing
Chinese parsing has been a highly active research area in recent years. This paper describes a hierarchical maximum pattern matching to integrate rule induction approach for sentence parsing on traditional Chinese parsing task. We have analyzed and extracted statistical POS (part-of-speech) tagging information from training corpus, then used the related information for labeling unknown words in...
متن کاملSuffering from Illness and Euthanasia sentence
Perhaps, the most appropriate translation proposed for euthanasia is the painless and piteous killing. According to the existence of effective components in committing a crime, it is considered as complicity in murder and the consent of victim does not affect the nature of criminal act and the criminal liability of person depriving the life. One of issues related to this killing which is disagr...
متن کاملDesign and Implementation of an Intelligent Part of Speech Generator
The aim of this paper is to report on an attempt to design and implement an intelligent system capable of generating the correct part of speech for a given sentence while the sentence is totally new to the system and not stored in any database available to the system. It follows the same steps a normal individual does to provide the correct parts of speech using a natural language processor. It...
متن کاملExample-Based Sentence Reduction Using Hidden Markov Model
Sentence reduction is the problem of removing redundant words or phrases from an input sentence by creating a new sentence, in which the gist of the meaning of the original sentence is unchanged. All most previous methods required a syntax parser before reducing sentence. However, these methods were difficult to apply to a language in which there was not a reliable parser. In this paper, we pro...
متن کاملComposition and Decomposition of Japanese Katakana and Kanji Morphemes for Decision Rule Induction from Patent Documents
We propose a new method to construct a word list for rule induction from Japanese patent documents. For word segmentation in Japanese, statistical morphological analyzers have been used in many applications. However, the output of these morphological analyzers presents defects when analyzing unknown words, specifically words that contain Kanji/Katakana morphemes. Some words are overly segmented...
متن کامل